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1  Introduction 


This  report  summarizes  research  carried  out  under  AFOSR  grant  #F49620-92-J-0383DEF. 
Under  AFOSR  sponsorship,  new  schemes  for  fault-tolerance  in  multiprocessor  and  dis¬ 
tributed  systems  have  been  developed  as  described  below: 

•  Design  and  implementation  of  fault  tolerance  schemes  for  multiprocessor  and  dis¬ 
tributed  systems.  We  have  investigated  a  number  of  fault  tolerances  schemes  to  evalu¬ 
ate  the  performance,  reliability  and  avmlability  trade-offs.  Fault  tolerance  schemes  will 
be  developed  for  various  fault  models  and  application  areas.  The  fault  models  may  be 
divided  into  three  classes:  (i)  fail-stop  model,  (ii)  fidl-slow  model,  and  (iii)  arbitrary 
failure  model.  The  applications  are  divided  into  two  types:  (i)  long-running  applica¬ 
tions  (such  as  distributed  simtdations,  weather-forecasting,  etc.)  which  are  expected 
to  provide  results  at  the  end  of  computation,  (ii)  applications  that  are  long-running 
but  are  also  expected  to  provide  results  often  during  the  computation.  The  require¬ 
ments  of  these  two  application  areas  are  somewhat  different,  requiring  different  fault 
tolerance  techniques. 

The  goal  of  our  research  has  been  to  design  unified  approaches  to  deal  with  various  fault 
models  and  experimentally  evaluate  the  performance  of  the  proposed  fault  tolerance 
mechanisms. 

•  Software-implemented  fault  tolerance  for  multiprocessor  systems  such  as  nCUBE  and 
MasPar.  We  are  studying  approaches  for  providing  user-transparent  mechanisms  for 
fault  tolerance.  The  goal  here  is  to  design  and  implement  a  software  library.  The  user 
can  link  the  existing  application  software  to  this  library  and  achieve  the  desired  level 
of  fault  tolerance. 

•  Design  and  development  of  a  new  tool  for  evaluating  the  reliability  and  availability 
of  distributed  and  mxiltiprocessor  systems  using  various  fault  tolerance  techniques. 
Such  a  tool  will  facilitate  evaluation  of  the  fault  tolerance  schemes  that  we  propose  to 
develop. 

2  Research  Progress  in  Detail 

This  section  discusses  the  above  three  thrust  areas,  and  also  presents  our  preliminary  work 
in  each  of  the  areas. 
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2.1  Fault  Tolerance  in  Multiprocessor  and  Distributed  Systems 


Design  and  implementation  of  fault  tolerance  schemes  for  multiprocessor  and  distributed 
systems  is  a  thrust  area  of  this  continuing  research.  We  are  investigating  a  number  of  fault 
tolerance  schemes  for  multiprocessor  and  distributed  systems  to  evaluate  the  performance, 
reliability  and  availability  trade-offs.  The  favdt  tolerance  mechanism  used  in  such  systems 
must  be  chosen  based  on  a  number  of  criterion.  The  criterion  that  we  consider  important 
are  as  follows. 

•  Reliability  and  availability  requirements.  These  requirements  have  a  serious  impact 
on  the  level  of  redundancy  required.  These  requirements  may  be  specified  proba¬ 
bilistically  (e.g.  availability  of  99.2%)  or  deterministically  (e.g.  tolerate  up  to  two 
simultaneous  failures).  High  reliability  and  availability  requirements  typically  require 
higher  redundancy. 

•  The  fault  model.  The  fault  models  that  are  applicable  to  most  real-life  systems  are: 

(i)  fail-stop  model.  This  is  the  most  frequently  studied  fault  model.  Here  it  is  assumed 
that  a  faulty  processor  detects  its  own  failure  and  stops  functioning  immediately.  Al¬ 
though  easy  to  understand,  realization  of  this  fault  model  results  in  significant  hard¬ 
ware  overhead. 

(ii)  fail-slow  model.  This  model  is  weaker  than  the  fail-stop  model.  Here,  it  is  assumed 
that  a  faulty  processor  detects  its  failure  within  a  certain  time  after  the  fault  occurs. 

(iii)  arbitrary  failure  model.  Sometimes  this  is  called  the  Byzantine  fault  model.  Here, 
no  assumption  is  made  on  the  behavior  of  a  faulty  processor.  This  fault  model  is  the 
easiest  to  realize.  However,  achieving  fault  tolerance  is  much  more  difficult  than  the 
other  two  fault  models. 

We  propose  to  investigate  fault  tolerance  schemes  for  all  the  three  models. 

•  The  application.  Two  types  of  applications  are  of  particular  interest:  (i)  long-running 
applications  which  are  expected  to  provide  results  at  the  end  of  computation  (e.g.  dis¬ 
tributed  simulations,  weather-forecasting,  etc.)  (ii)  applications  that  are  long-running 
but  are  also  expected  to  provide  results  often  during  the  computation.  The  require¬ 
ments  of  these  two  application  areeis  are  somewhat  different,  requiring  different  fault 
tolerance  techniques. 

The  goal  of  our  research  is  to  provide  fault  tolerance  approaches  to  match  various 
reliability  and  application  requirements,  and  experimentally  evaluate  the  performance  of 
the  proposed  fault  tolerance  mechanisms.  We  propose  to  develop  a  testbed  to  implement 
a  wide  range  of  fault  tolerance  schemes  for  multiprocessor  and  distributed  environments. 
An  important  objective  here  is  to  provide  a  common  basis  for  experimental  evaluation  and 
comparison  of  various  schemes.  To  illustrate  our  goals,  we  now  present  a  fault  tolerance 
scheme  for  the  following  scenario. 
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•  Reliability  goal:  Tolerance  of  a  single  failure. 

•  Fault  model:  Arbitrary  or  Byzantine  fault  model. 

•  Application:  Long-running  application  providing  results  at  the  very  end. 

This  fault  tolerance  scheme  has  been  implemented  for  the  purpose  of  evaluating  the  per¬ 
formance  overhead  of  fault  tolerance.  We  discuss  its  implementation  and  present  some 
measurements. 

The  main  features  of  the  proposed  checkpointing  and  recovery  scheme  [11]  are: 

•  Process  duplication, 

•  Fault  identification  using  retry  on  a  fault-free  processor, 

•  No  output  delays  due  to  checkpointing  or  message  logging. 

Sender-based  message  logging  was  put  forth  in  [8],  to  minimize  the  cost  of  achieving 
fault  tolerance  by  avoiding  logging  of  inter-process  messages  on  a  stable  storage.  There,  fault 
tolerance  is  achieved  by  logging  at  the  sender,  the  message  as  well  as  its  Receive  Sequence 
Number  (RSN).  Receive  sequence  number  of  a  message  indicates  when  the  message  was 
received  relative  to  the  other  messages  received  by  that  receiver.  In  this  scheme,  the  receiver 
informs  the  sender  the  RSN  of  each  received  message,  the  sender  acknowledging  receipt  of  the 
RSNs.  A  receiver  process  cannot  commit  any  output  until  it  receives  the  acknowledgements 
for  the  RSNs  of  all  messages  consumed  before  the  output  was  produced. 

Byzantine  failure  model  makes  process  duplication  necessary  to  achieve  single  fault 
detection.  The  messages  are  logged  by  the  sender  process  in  its  volatile  storage  and  RSNs 
are  logged  by  the  receiver  process  in  its  volatile  storage.  Thus,  each  processor  maintains  a 
send  log  and  a  receive  log  for  each  process  scheduled  on  that  processor.  When  a  processor 
fails,  undetected  errors  may  be  introduced  into  both  volatile  logs  and  the  volatile  state 
of  a  process  executing  on  the  faulty  processor.  A  failure  is  detected  when  the  messages 
sent  by  replicas  of  a  process  mismatch,  or  when  the  state  of  the  replicas  mismatches  at  a 
checkpoint.  Although  a  failure  is  detected  by  such  a  mismatch,  still,  the  faulty  processor 
cannot  be  identified. 

Our  scheme  can  be  extended  to  systems  where  a  processor  executes  miiltiple  pro¬ 
cesses,  however,  in  the  follomng  each  processor  is  assumed  to  be  executing  a  single  process. 
Therefore,  the  following  uses  the  terms  process  and  processor  interchangeably. 


System  Model 

The  system  consists  of  multiple  processors,  as  shown  in  Figure  1.  Each  processor  consists 
of  a  CPU  and  volatile  memory.  Each  process  has  two  replicas.  The  replicas  of  a  process  are 
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scheduled  on  different  processors  to  prevent  simultaneous  failure  of  both  the  replicas.  For 
example,  Figure  1  shows  replicas  of  two  processes,  P  and  Q.  (The  two  replicas  of  a  process 
are  identified  by  subscripts  1  and  2). 

Processes  communicate  only  through  messages  (no  shared  memory).  Each  process 
may  send  messages  to  other  processes,  as  well  as  consume  messages  sent  by  other  processes. 
All  processes  are  deterministic,  if  the  two  process  replicas  start  in  an  identical  state  and 
consume  the  same  set  of  messages  in  an  identical  order,  then  the  replicas  will  end  up  in  an 
identical  state.  A  logical  clock  is  associated  with  each  replica.  The  logical  clock  may  simply 
count  the  number  of  messages  sent  and  consumed  by  a  process  replica  by  incrementing  the 
clock  just  before  sending  or  consuming  a  message.  The  logical  clock  may  be  made  faster 
by  also  incrementing  it  at  other  points  in  the  code.  Logical  clocks  of  both  the  replicas 
are  incremented  at  the  same  logical  points  during  execution.  Each  process  checkpoints 
periodically.  Both  the  replicas  checkpoint  at  the  same  logical  point  in  the  execution,  achieved 
using  the  logical  clock.  Different  processes  checkpoint  independently;  no  coordination  is 
assumed. 


Q\  Q2 

Figure  1:  System  model 


Message  Passing  Mechanism 

Each  fault-free  replica  of  a  process  must  consume  identical  messages  in  the  same  order. 
This  can  be  achieved  using  message  authentication  and  time-outs.  The  message  passing 
mechanism  ensures  that  either  both  fault-free  replicas  of  a  process  receive  a  message,  or 
neither  does.  Also,  it  is  ensured  that  both  the  replicas  receive  the  messages  in  the  same 
order.  Discussion  of  the  message  passing  protocol  is  omitted  here.  A  message  may  be 
consumed  only  when  two  identical  copies  of  the  message  are  received  from  the  two  sender 
replicas.  The  RSN  of  the  message  is  determined  by  when  the  second  copy  of  the  message  is 
received. 
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Each  replica  of  a  message's  sender  retains  in  a  volatile  send  log  a  copy  of  the  message. 
This  copy  includes  the  Sender  Sequence  Number  (SSN)  of  the  message  which  indicates  the 
position  of  the  message  in  the  stream  of  outgoing  messages.  The  received  message  also 
includes  the  SSN.  Each  replica  of  the  receiver  process  retains  in  its  volatile  receive  log  an 
entry  containing  the  message,  its  RSN  and  its  SSN.  These  logs  need  not  be  saved  on  the 
stable  storage.  If  necessary,  to  free  the  memory,  the  logs  may  be  saved  (asynchronously)  on 
a  local  disk  or  the  stable  storage. 

A  message  in  the  send  or  receive  log  is  discarded  when  (i)  sender  process  has  check- 
pointed  after  sending  this  message  and  (ii)  the  receiver  process  has  checkpointed  after  con¬ 
suming  this  message.  The  following  discusses  the  two  basic  steps:  (i)  fault  detection,  followed 
by  (ii)  fault  identification  and  recovery. 

F^ult  Detection 

A  process  is  said  to  be  faulty  if  one  of  its  replica  has  failed.  As  shown  below,  the  proposed 
scheme  ensures  that  the  fault-free  replica  of  a  faulty  process  detects  the  failure  before  its 
next  checkpoint  is  taken.  Let  P  be  the  faulty  process  with  replicas  Pi  and  P2.  Also,  let 
Pi  be  the  faulty  replica.  When  the  fault-free  replica  P2  detects  the  failure,  it  broadcasts  a 
message  to  all  the  processes  that  process  P  has  failed.  Note  that  even  if  P2  broadcasts  a 
message  that  Pi  has  fsdled,  there  is  no  reason  for  the  other  processes  to  assume  that  replica 
P2  is  fault-free  (because  P2  itself  may,  in  fact,  be  faulty).  When  the  other  processes  receive 
any  such  message,  the  fault  identification  procedure  (presented  later)  is  initiated.  Note  that 
a  process  is  considered  to  be  faulty  only  when  one  of  its  replicas  broadcasts  the  message 
that  it  has  failed.  Now,  the  four  different  ways  in  which  the  fault-free  replica  may  detect  a 
failure  are  discussed. 

(a)  Replica  P2  will  detect  the  failure  of  Pi  if  Pi  does  not  correctly  participate  in  the 
message  passing  and  agreement  protocol.  On  the  other  hand,  if  Pi  executes  the  agreement 
protocol  correctly,  then  P2  detects  a  failure  in  Pi  by  one  of  the  following  three  ways: 

(b)  Both  the  replicas  of  a  process  checkpoint  periodically  at  the  same  logical  time. 
All  the  volatile  state  is  included  in  the  checkpoint;  the  send  and  receive  logs,  however, 
are  not  saved  as  a  part  of  the  checkpoint.  Each  replica  saves  its  state  on  a  stable  storage 
and  compares  the  state  with  that  of  its  replica.  The  comparison  may  be  performed  using 
signatures  [4].  If  the  two  states  do  not  match,  then  a  failure  of  one  of  the  replicas  is  detected. 
In  our  example,  if  the  volatile  state  of  Pi  is  corrupted  due  to  the  failure,  then  P2  will  detect 
the  failure  when  it  tries  to  take  the  next  checkpoint.  If  the  state  of  the  two  replicas  of  P 
matches,  then  before  the  checkpoint  can  be  considered  valid,  the  checkpoint  of  process  P 
must  be  “approved”  by  each  process  that  has  received  a  message  from  P  since  P’s  previous 
checkpoint.  A  fault-free  process  Qi  “approves”  the  checkpoint  taken  by  P  only  if  the  two 
replicas  of  P  did  not  send  mismatching  messages  to  Qi.  A  checkpoint  is  not  valid  until  it  is 
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approved  by  all  relevant  processes.  A  receiver  process,  Q,,  may  inform  the  sender  process 
its  “disapproval”  any  time  after  it  receives  the  mismatching  messages.  There  are  two  ways 
this  may  occur. 

(c)  Replica  Pi  sends  message  Mi  to  Qi  and  Qa  and  replica  P2  sends  message  Ma. 
Both  Qi  and  Qa  detect  the  message  mismatch  and  send  “disapproval”  to  Pi  and  P3.  When 
Pa  receives  the  disapprovals  from  both  Qi  and  (^a*  it  concludes  that  process  P  is  faulty.  In 
this  case,  Qi  and  Qa  cannot  determine  whether  Pi  or  Pa  is  faulty. 

(d)  A  situation  similar  to  (c)  arises  if  Pi  sends  two  different  messages  to  Qi  and 
Qa-  As  message  passing  uses  authentication,  both  Qi  and  Q2  detect  that  Pi  has  sent  them 
different  messages,  and  they  both  send  their  disapprovals  to  Pi  and  Pa.  As  before,  when 
Pa  receives  the  disapprovals  from  both  Qi  and  Qa*  it  concludes  that  process  P  is  faulty. 
Actually,  in  this  case,  Qi  and  Q2  both  know  that  Pi  has  failed.  Therefore,  Qi  and  Q2 
can  broadcast  this  information  to  all  the  processes  and  the  fault  identification  procedure 
described  later  is  not  required. 

Thus,  a  failure  in  Pi  is  detected  by  Pa  by  one  of  the  above  four  means.  When  a 
process  receives  a  message  from  Pa  that  “process  P  is  faulty”,  the  message  is  interpreted  to 
mean  that  “one  of  the  two  replicas  of  process  P  is  faulty”  and  both  the  processors  executing 
the  replicas  of  P  are  considered  suspect. 

When  the  fault  in  P  is  detected,  the  replicas  of  process  P  are  forced  to  save  their  state 
on  the  stable  storage  at  the  next  increment  of  their  logical  clock  and  stop  executing.  Let 
the  logical  time  at  which  replica  Pi  thus  saves  its  state  be  U  and  the  logical  time  at  which 
Pa  thus  saves  its  state  be  <3.  Note  that  if  the  failure  is  detected,  as  discussed  in  (b)  above, 
then  ti  =  ta;  otherwise  <1  and  <3  may  be  different.  The  previous  checkpoint  of  process  P  is 
also  retained  on  the  stable  storage. 


Eault  Identification  and  Recovery 

The  faulty  processor  is  identified  using  “retry”  of  the  computation  of  the  faulty  process  on 
a  fault-free  processor,  say  processor  R.  Due  to  the  single  undetected  fault  assumption,  all 
the  processors  that  are  not  suspect  can  be  considered  fault-free. 

When  a  process  replica  fails,  any  of  the  following  may  be  corrupted:  send  log,  receive 
log,  messages  sent  to  other  processes  and  the  volatile  state.  The  following  procedure  detects 
all  these  errors. 

To  determine  which  processor  executing  replicas  of  P  is  faulty,  first  assume  that  Pi 
is  fault-free.  The  computation  performed  by  Pi  since  its  previous  checkpoint  is  retried  on 
R.  For  this  purpose,  first  the  previous  checkpoint  taken  by  process  P  is  loaded  on  R.  Also, 
the  volatile  send  and  receive  logs  maintained  by  Pi  are  sent  to  R.  Then,  R  executes  process 
P,  starting  from  the  previous  checkpoint.  The  messages  are  consumed  by  R  in  the  same 
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order  as  indicated  by  the  receive  log  of  Pi .  The  receive  log  of  Pi  includes  for  each  received 
message  the  identification  of  its  sender  and  the  SSN.  Using  the  sender  ID  and  the  SSN  in 
the  receive  log  of  Pi,  R  requests  the  sender  to  resend  the  corresponding  message.  When 
the  message  is  received,  it  is  compared  with  the  copy  of  the  message  in  Pi ’s  receive  log.  A 
mismatch  indicates  that  Pi  is  faulty.  Also,  if  the  original  assumption  that  Pi  is  fault-free  is 
true,  then  when  the  computation  is  retried  on  R,  R  must  send  exactly  the  same  messages 
as  those  sent  earlier  by  Pi,  if  any.  If  the  messages  sent  by  Pi  and  by  R  (during  retry)  do 
not  match,  then  Pi  must  be  faulty  and  thus  P2  is  identified  to  be  fault-free. 

If  all  the  messages  sent  by  R  and  Pi  match,  then  the  state  of  R  at  logical  time  ti 
is  compared  with  the  state  of  Pi  at  ti.  Recollect  that  Pi  saved  its  state  at  logical  time  ti 
after  process  P  was  detected  to  be  faulty.  If  the  comparison  results  in  a  mismatch.  Pi  must 
be  faulty  and  P2  is  considered  fault-free.  If  the  comparison  results  in  a  match.  Pi  must  be 
fault-free  and  P3  is  identified  as  faulty. 

Note  that  if  a  process  replica  has  failed,  at  least  one  of  the  following  must  be  cor¬ 
rupted:  send  log,  receive  log,  messages  sent  to  other  processes  and  the  volatile  state.  The 
above  procedure  detects  errors  in  each  of  these.  Therefore,  the  fault  identification  procedure 
correctly  identifies  the  faulty  processor. 

Once  the  fault-free  replica  of  the  faulty  process  is  identified,  the  state  of  the  fault-free 
replica  can  be  copied  to  another  processor,  thereby  creating  two  consistent  replicas  (both  in 
correct  state)  of  process  P.  Thus,  the  system  recovers  from  the  single  processor  failure. 


Experimental  Testbed  and  Evaluation 

An  experimental  system  has  been  developed  to  measure  the  performance  degradation  caused 
by  the  above  fault  tolerance  mechanism.  A  software  layer  has  been  developed  that  im¬ 
plements  the  algorithm  presented  above.  The  software  is  developed  in  C  language  and 
measurements  are  carried  on  a  network  of  SPARC  workstations.  The  software  layer  and 
measurement  methodology  is  described  below. 

The  implementation  can  broadly  be  classified  into  the  following  four  modules: 

1.  Initiation:  Given  the  number  of  processes,  and  the  host  names,  this  module  is  respon¬ 
sible  to  open  communication  channels.  For  testing  purposes  we  have  used  a  complete 
network  connection.  The  communication  protocol  used  is  TCP. 

2.  User  Processes:  For  measurement  purposes,  user  processes  periodically  send  messages, 
receive  messages,  and  checkpoint.  The  rate  at  which  these  operations  are  performed 
can  be  controlled  by  input  parameters.  The  time  complexity  of  the  user  process  is 
dependent  on  the  number  of  messages  sent  by  this  process. 
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3.  Message  Reception  and  processing:  This  module  performs  the  Byzantine  message 
agreement  protocol  required  for  the  scheme. 

4.  Checkpoint  This  module  has  two  components:  (i)  .chkpnt,  and  (ii)  .compare,  .chkpnt 
component  checkpoints  the  process.  The  chk.freq  determines  when  the  process  has 
to  take  its  checkpoint.  For  measurement  purposes,  the  checkpoint  component  dumps 
chk.data.size  bytes  to  the  disk  (stable  storage).  The  chk.freq  and  chk.datajsize  is  varied 
to  change  the  checkpoint  frequency  and  the  size  respectively.  After  dumping  its  state, 
if  its  replica  has  also  taken  the  checkpoint,  checkpoint  comparison  takes  place,  else, 
it  continues  with  its  process.  This  is  achieved  using  the  .compare  component.  If  the 
previous  checkpoint  has  not  yet  been  compared  the  process  enters  the  .poll  mode.  The 
chk.polLfreq  variable  determines  as  to  how  frequently  the  process  should  poll  the  disk 
for  its  replica’s  checkpoint.  For  all  the  measurements,  we  have  assumed  chk.polLfreq 
to  be  half  of  the  chk.freq  value. 

Experimental  Results 

To  measure  the  overhead  imposed  by  the  fault  tolerance  scheme,  we  measured  two  quantities: 
(i)  total  execution  time  required  to  complete  the  task  without  fault  tolerance,  and  (ii)  total 
execution  time  with  the  fault  tolerance  mechanism.  These  experiments  were  performed 
to  determine  the  execution  overhead  of  the  fault  tolerant  scheme  during  normal  execution 
(without  failures). 

The  parameters  varied  during  the  different  runs  were  the  checkpoint  size,  checkpoint 
frequency,  and  the  length  of  execution  time.  The  execution  time  was  assumed  to  be  propor¬ 
tional  to  the  number  of  messages  sent.  Thus  by  varying  the  number  of  messages  to  be  sent 
by  a  process,  we  got  different  lengths  of  execution  time. 

The  measurements  of  the  execution  times  for  the  various  checkpoint  sizes  and  check¬ 
point  frequencies  is  shown  in  Table  1.  The  execution  time  for  the  process  without  incorpo¬ 
rating  fault  tolerance  is  also  shown.  The  Size  column  of  the  table  refers  to  the  checkpoint 
size.  Cl  columns  shows  the  execution  times  for  the  various  checkpoint  interval  sizes.  CI=2 
means  that  a  checkpoint  was  taken  every  2  messages  sent.  The  Without  column  shows  the 
execution  time  for  the  process  in  the  absence  of  fault  tolerance. 

We  did  some  independent  measurements  of  the  time  taken  due  to  checkpointing 
and  comparison.  Let  Tckk  he  the  time  taken  to  checkpoint  and  perform  the  checkpoint 
comparison.  Average  value  of  Tehk  was  found  to  be  0.85  secs,  for  a  checkpoint  size  of  100 
Kbytes. 

From  Table  1,  we  can  estimate  the  overhead  due  to  checkpointing,  checkpoint  com¬ 
parison,  and  the  Byzantine  agreement.  Let  Ttotai  be  the  total  overhead,  N  be  the  number 
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Table  1:  Execution  times 


Num  of 
Msgs 

Size 

(bytes) 

CI=2 

(secs) 

CI=5 

(secs) 

CI=20 

(secs) 

Without 

(secs) 

20 

5K 

7 

7 

- 

6 

lOK 

9 

7 

- 

50K 

10 

8 

- 

lOOK 

13 

10 

- 

100 

5K 

33 

26 

15 

13 

lOK 

40 

29 

17 

50K 

57 

30 

18 

lOOK 

64 

34 

19 

5K 

65 

45 

31 

lOK 

72 

48 

33 

200 

50K 

117 

54 

35 

28 

lOOK 

125 

69 

39 

of  checkpoints,  and  be  the  overhead  due  to  the  Byzantine  agreement  protocol.  Table  2 
lists  some  calculations  for  checkpoint  size  of  100  Kbytes. 


Table  2:  Checkpoint  Overhead 


Num  of 
Msgs 

Without 

(secs) 

Ttotal 

(secs) 

N 

Tchk 

(secs) 

N  ♦  Tchk 
(secs) 

1^1 

100 

19 

13 

6 

4 

0.85 

3.4 

2.6 

200 

39 

28 

11 

D 

0.85 

7.65 

3.35 

We  propose  to  perform  similar  experiments  using  different  fault  tolerance  schemes 
that  we  will  develop. 

2.2  Software  Implemented  Fault  Tolerance  in  Parallel  Comput¬ 
ers 

Over  the  past  decade  the  push  for  higher  throughput  from  existing  technology  has  forced 
parallel  computing  systems  into  the  mainstream.  To  date,  hundreds  of  parallel  systems 
representing  millions  of  invested  dollars  are  currently  in  use.  Table  3  ,  taken  from  a  recent 
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Company 

Number  sold  to  date 

1991  Sales  in  Millions 

Intel 

>  325 

$90 

Meiko  Scientific 

>425 

25 

nCUBE 

>  300 

18 

Parsytec  GmbH 

100 

8 

Thinking  Machines 

90 

85 

TOTAL 

>  1,240 

226 

Table  3:  Sales  of  Parallel  Computers 

article  in  the  IEEE  Spectrum  [12],  shows  that  over  one  thousand  parallel  machines  have 
been  sold  for  a  total  of  more  than  two  hundred  million  dollars.  However,  in  spite  of  their 
popularity  these  systems  are  largely  unsuitable  for  applications  demanding  high  reliability 
or  prolonged  availability  due  to  high  failure  rates. 

To  some  it  may  come  as  a  surprise  that  parallel  systems  suffer  from  high  faulure  rates 
due  to  the  common  assumption  that  such  systems  are  inherently  reliable.  The  following 
quotation,  taken  from  a  recent  publication  by  Harper  and  Lala  [6],  points  out  the  error  in 
this  logic. 

The  assertion  is  often  made  that  parallel  processors  are  intrinsically 
reliable,  fault  tolerant,  and  reconfigurable  due  to  their  multiplicity  of 
processing  resources.  In  fact,  the  only  intrinsic  attribute  guaranteed  by 
multiple  processors  is  a  higher  total  failure  rate. 

Thus,  it  is  apparent  that  parallel  systems  are  in  desperate  need  of  fault-tolerant 
features  if  they  are  to  provide  the  same  level  of  dependable  service  as  their  uniprocessor 
ancestors.  In  all  practicality,  this  need  eventually  becomes  a  requirement  as  parallel  systems 
grow  in  scale. 

Potential  Solutions 

One  solution  is  to  provide  some  level  of  redimdancy  within  the  hardwsu'e.  Although  advances 
are  being  made  in  the  realm  of  fault- tolerant  parallel  computing  architectures,  it  will  be  some 
time  before  they  are  commercially  available.  Meanwhile,  sales  of  existing  parallel  systems 
are  growing  along  with  the  expansion  of  their  application  libraries.  When  the  new  architec¬ 
tures  finally  do  hit  the  market  the  necessary  additional  hardware  and  increased  complexity 
will  significantly  comi>ound  their  cost.  Therefore,  like  the  necessities  that  brought  parallel 
computing  systems  into  the  forefront,  there  is  the  need  for  cost  effective  fault-tolerance  from 
existing  parallel  systems.  Software  implemented  fault-tolerance  (SIFT)  is  one  very  promis- 


II 


ing  solution.  Currently,  however,  there  are  two  problems  plaguing  SIFT:  there  are  no  easy 
ways  to  utilize  existing  approaches  and  no  easy  ways  to  experiment  with  new  approaches. 

There  have  been  numerous  approaches  proposed  for  the  provision  of  hardware  fault- 
tolerance  within  software,  such  as  recovery  blocks,  duplex,  TMR,  and  checkpointing  schemes. 
However,  in  order  to  utilize  these  schemes  programmers  must  explicitly  incorporate  them 
into  applications.  For  parallel  programmers  this  compotmds  the  already  daunting  task  of 
parallel  programming.  In  addition,  little  help  comes  in  the  form  of  special  programming 
languages  because  designers  of  most  parallel  languages  have  largely  ignored  the  issue  of 
fault-tolerance  in  favor  of  high  performance.  Explicit  implementation  of  fault-  tolerance 
schemes  within  existing  applications  requires  explicit  modification,  which  is  a  very  difficult 
and  costly  undertaking.  Thus,  requiring  that  parallel  programmers  explicitly  handle  fault- 
tolerance  makes  its  incorporation  into  new  and  existing  applications  intolerably  difficult  and 
expensive. 

Because  of  the  difficulty  realizing  SIFT  approaches,  experimentation  and  testing  of 
new  and  existing  approaches  is  severely  hampered.  Clearly,  in  order  to  qualify  a  new  or  ex¬ 
isting  SIFT  approach  as  effective,  it  must  be  implemented  and  tested.  Presently,  researchers 
are  required  to  spend  countless  hours  programming  new  SIFT  approaches  into  specific  ap¬ 
plications  in  order  to  evaluate  them.  This  slows  down  the  research  process  and  prolonging 
the  achievement  of  inexpensive,  effective  fault- tolerance  on  existing  parallel  systems. 

All  these  difficulties  can  be  overcome  by  implementing  fault-tolerance  into  an  applica¬ 
tion  independent,  user- transparent,  modular  software  layer.  By  handling  all  of  the  requisite 
duplication,  comparison,  recovery,  and  synchronization  chores  necessary  for  fault-tolerance 
operation,  such  a  layer  could  make  it  possible  to  execute  new  and  existing  applications  re¬ 
liably  without  explicit  programmer  intervention.  Likewise,  the  layer  could  be  made  flexible 
enough  to  allow  easy  modification  of  the  SIFT  approach  used,  thus  making  it  amenable  to 
experimentation  with  new  approaches. 


Research  Goal 

The  main  goal  of  this  on-going  research  is  to  develop  the  aforementioned  software  layer  for 
the  purpose  of  providing  dependable  operation  at  minimal  cost  on  numerous  existing  paraJlel 
computing  systems,  as  well  as  the  provision  of  a  flexible  framework  within  which  to  explore 
new  fault-tolerance  approaches. 

The  following  sections  present  the  various  aspects  of  the  proposed  research.  The 
preliminary  requirements  of  the  SIFT  layer  are  discussed. 
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Preliminary  Requirements 

If  a  software  implementation  is  to  succeed,  it  must  follow  complete  and  precise  design  re¬ 
quirements.  The  following  is  a  collection  of  the  more  general  preliminary  requirements 
presently  established  for  the  SIFT  layer. 

(1)  It  should  be  developed  on  top  of  the  existing  system  software.  The  purpose  for 
this  requirement  is  to  prevent  the  layer  from  becoming  too  hardware  dependent,  which  would 
limit  portability  and  reduce  maintainability. 

(2)  It  should  provide  the  user  the  capability  to  choose  which  fault-tolerant  approach 
is  to  be  used.  This  requirement  has  a  two-fold  purpose.  First  of  all,  it  would  allow  the 
riser  to  control  the  level  of  fault  tolerance  according  to  the  importance  of  the  application 
being  executed.  Secondly,  it  would  provide  researchers  the  ability  to  test  and  compare  the 
characteristics  of  various  fault-tolerant  approaches  simply  and  effectively. 

(3)  It  should  be  independent  of  the  interconnection  topology  of  the  target  parallel 
computer.  The  purpose  of  this  requirement  is  the  same  as  that  for  (1). 

(4)  It  should  not  be  application  dependent.  The  purpose  for  this  requirement  is  to 
guarantee  that  the  programmer  need  not  handle  the  fault-tolerance  explicitly. 

(5)  It  should  allow  existing  programs  to  be  run  without  modification.  This  require¬ 
ment  ensures  that  the  software  layer  will  be  transparent  to  the  user. 

(6)  It  shordd  be  based  on  a  set  of  primary  functions  collectively  referred  to  as  the 
SIFT-kemel.  This  requirement  implies  that  the  layer  be  divided  into  two  primary  modules: 
the  SIFT-functional  layer  and  the  SIFT-kernel.  The  SlFT-kernel  should  contain  routines 
that  provide  a  uniform,  hardware  independent  interface  to  the  SIFT-functional  layer.  The 
SIFT-functional  layer  should  contain  only  the  functions  that  are  specific  to  the  SIFT  ap¬ 
proach  being  used  and  should  not  bypass  the  SIFT-kernel  to  get  to  the  underlying  system. 
This  serves  two  main  purposes.  First,  it  increases  portability  and  maintenance  by  limit¬ 
ing  hardware  specific  modifications  to  the  SIFT-kernel  only.  Secondly,  it  requires  that  the 
layer  be  constructed  in  a  modular  design  so  that  new  fault-tolerant  approaches  can  be  easily 
constructed  from  available  function  primitives. 

(7)  It  should  be  amenable  to  formal  representation  for  the  purpose  of  formal  verifi¬ 
cation.  This  requires  that  the  program  designers  develop  a  clean  implementation  that  can 
be  formally  represented  by  means  of  a  set  of  assertions  for  the  purpose  of  revealing  imple¬ 
mentation  and  design  faults  utilizing  function-deterministic  tests.  This  is  a  very  important 
requirement  since  the  level  of  dependability  provided  to  the  application  level  is  only  as  good 
as  the  dependability  of  the  SIFT  layer.  Thus,  it  is  imperative  that  the  layer  be  thoroughly 
tested  for  intrinsic  faults. 

(8)  It  should  permit  fault  injection  for  the  purpose  of  dependability  validatior.  This 
requirement  ensures  that  there  is  a  means  for  validating  the  software  layer. 
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Figure  2:  Example  of  the  SIFT  Layer  Utilizing  A  Duplex  Approach 


FVom  the  preliminary  requirements,  it  is  possible  to  visualize  the  relationship  and 
interaction  between  a  user  application,  the  proposed  SIFT  layer,  and  the  operating  system. 
For  example.  Figure  2  shows  two  nodes  of  a  multiprocessing  computer  running  a  user  appli¬ 
cation  on  top  of  the  proposed  SIFT  layer.  In  this  example,  the  SIFT  layer  is  configured  to 
use  the  duplex  approach. 

As  is  apparent  from  the  figure,  the  application  program  has  direct  access  to  the 
operating  system  for  system  calls  not  pertinent  to  fault  tolerance  (such  as  the  acquisition 
of  a  file  handle)  but  calls  that  do  require  special  attention  (such  as  the  spawning  of  a  new 
process)  are  intercepted  by  the  SIFT  layer  and  handled  transparently.  When  it  is  necesssu-y 
to  synchronize  the  duplexed  applications  or  make  a  comparison,  the  peer  SIFT  layers  can 
commimicate  over  the  interconnection  network  using  ordinary  system  calls  unbeknownst  to 
the  duplexed  applications. 

Example  SIFT  Approaches 

As  was  stressed  earlier,  one  objective  of  the  proposed  SIFT  layer  is  that  it  be  flexible  enough 
to  allow  the  implementation  and  testing  of  limitless  new  and  existing  approaches  to  fault- 
tolerance.  Some  examples  of  the  possible  SIFT  approaches  that  could  be  implemented  and 
tested  within  the  proposed  SIFT  layer  are  duplex,  TMR,  New  Roll-Forward  checkpointing 
[10],  and  Skew  RoU-Forward  checkpointing. 
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Example  Implementation  of  the  SIFT  Layer  on  the  nCUBE 

The  nCUBE  has  been  chosen  as  the  target  machine  for  the  initial  development  of  the  SIFT 
layer  because  Texas  A&M  is  currently  in  possession  of  a  64-node  nCUBE  2.  The  following 
is  a  brief  overview  of  relevant  characteristics  of  the  nCUBE. 

The  nCUBE  computer  is  a  distributed  memory,  message  passing  multiprocessor.  Its 
processing  nodes  are  connected  via  a  hypercube  interconnection  network.  Every  node  within 
the  network  shares  the  same  system  clock,  but  each  executes  instructions  from  its  own  mem¬ 
ory  independently  from  the  rest.  However,  it  is  possible  to  synchronize  programs  running 
on  separate  nodes  through  an  exchange  of  messages. 

The  nCUBE  is  not  a  stand-alone  computer,  but  requires  a  host  to  act  as  a  user 
interface.  The  host  communicates  with  the  nCUBE  via  a  program  on  a  special  Platform 
Interface  Board  called  VORTEX.  Each  node  within  the  nCUBE  runs  its  own  copy  of  a 
UNIX-like  operating  system  called  VERTEX.  This  operating  system  provides  a  program 
the  capability  to  start  a  new  process,  duplicate  a  current  process,  suspend  a  process,  and 
restart  a  process. 

Programs  can  be  loaded  onto  the  nodes  from  three  different  environments:  a  shell 
on  the  host  computer,  a  program  running  on  the  host,  and  a  program  running  on  a  node. 
Parallel  programs  written  for  the  nCUBE  typically  consist  of  a  collection  of  program  elements 
that  execute  on  separate  processors,  each  accomplishing  their  own  portion  of  the  overall  task. 
These  program  elements  can  also  communicate  with  each  other  over  the  interconnection 
network. 

The  VERTEX  operating  system  contuns  a  number  of  basic  commands  and  features 
that  together  form  a  sturdy  platform  for  the  SIFT-  kernel.  Notable  among  these  are  com¬ 
mands  that  do  the  following:  (i)  allow  a  process  on  one  node  to  load  and  execute  a  pro¬ 
cess  on  another  node,  (ii)  synchronize  processes  on  separate  nodes  using  messages,  (iii) 
exactly  duplicate  a  ruiming  process,  and  (iv)  suspend  and  restart  a  process.  In  addition,  the 
nCUBE  system  software  includes  an  interactive  source  and  symbolic  debugger  that  allows 
programmers  to  examine  variables,  data,  call  stacks,  procedure  arguments,  message  queues, 
and  refpsters,  thus  providing  the  means  for  software  testing.  Thus  it  is  evident  that  the 
VERTEX  operating  system  provides  some  functionality  necessary  for  the  realization  of  the 
SIFT-kemel  layer. 

2.3  The  Reliable  Architecture  Characterization  Tool 

Synthesizing  an  architecture  and  evaluating  its  dependability  are  perhaps  the  two  great¬ 
est  challenges  which  the  designers  of  fault-tolerant  computing  systems  must  face.  Many 
highly  dependable  systems  being  realized  today  utilize  one  of  several  proven  configurations 
consisting  of  redundant  processors  and  memories  interconnected  with  some  form  of  logic  to 
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detect,  correct  or  mask  errors  and  remove  failed  modules.  Systems  making  use  of  N-modular 
redundancy,  duplication  and  comparison,  standby  sparing  or  system-wide  coding  are  famil¬ 
iar  members  of  this  class  of  fault-tolerant  multiprocessor  architectures.  The  reliability  and 
availability  metrics  of  these  systems  have  typically  been  validated  either  through  combi¬ 
natorial  approaches  such  as  fault-trees  and  reliability  block  diagrams,  or  through  Markov 
modeling.  The  procedure  for  the  synthesis  and  evaluation  of  a  fault-tolerant  architecture 
is  frequently  iterative  in  nature,  terminating  when  the  particular  system  design  has  been 
optinuzed  to  meet  specifications  for  dependability,  performance,  cost,  size,  weight  and  power 
constunption.  It  is  therefore  essential  that  automated  methods  of  analyzing  this  family  of 
reliable  architectures  be  at  hand  to  assist  in  the  design  procedure. 

A  REliable  Architecture  Characterization  Tool  (REACT)  is  currently  being  devel¬ 
oped  to  meet  this  need  for  a  generalized  simulation  tool  which  can  analyze  the  high-level 
dependability  metrics  of  a  variety  of  fault- tolerant  computer  designs  [2].  Incorporating  de¬ 
tailed  system,  workload  and  fault /error  models  into  the  integrated  framework  of  a  testbed, 
this  software  can  be  more  accurate  and  easier  to  use  than  many  tools  based  on  analytical 
approaches.  Because  it  facilitates  precise  estimation  of  reliability  and  availability  early  in 
the  concept  and  design  phase  of  system  development,  this  tool  will  potentially  enable  the 
engineer  to  synthesize  an  architecture  which  better  matches  specifications  than  possible  with 
the  more  traditional  analytical  techniques. 

We  are  currently  extending  REACT  to  aid  in  reliability  and  availability  evaluation 
of  multiprocessor  and  distributed  systems. 

Features  of  REACT 

REACT  is  a  software  testbed  that  performs  automated  life  testing  of  many  user-defined  mul¬ 
tiprocessor  architectures  through  simulated  fault-injection.  During  a  single  simulation  run, 
the  code  conducts  a  certain  number  of  experiments  or  trials  in  which  an  initially  fault-free 
system  is  operated  until  it  fails  or  reaches  a  specified  censoring  time.  The  exact  number  of 
trials  required  is  determined  by  the  desired  confidence  intervals  about  the  system  depend¬ 
ability  attribute  being  measured.  The  censoring  time  dictates  the  maximum  operational 
lifetime  of  interest  for  the  given  system.  Those  trials  in  which  the  system  remains  functional 
beyond  the  censoring  time  are  terminated,  thereby  shortening  the  run-time  of  the  simula¬ 
tion  without  affecting  the  measurements  of  interest.  Extensive  instrumentation  has  been 
included  in  the  program  to  collect  data  from  each  trial,  which  is  later  aggregated  over  the 
entire  simulation  run  in  order  to  generate  the  outputs.  Graphs  of  reliability  or  availability, 
a  comprehensive  failure  mode  report  and  various  statistical  measurements  are  provided  as 
output  by  REACT.  The  software  now  consists  of  approximately  10000  lines  of  C  running 
under  UNIX  and  completes  a  “typical”  simulation  run  in  a  few  hours  on  an  engineering 
workstation. 
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Figure  3:  Class  of  Architectures  REACT  Can  Analyze 


System  Model 

Presently,  REACT  can  analyze  the  class  of  architectures  consisting  of  one  or  more 
processor  modules  (P)  interconnected  via  buses  (B)  to  one  or  more  memory  modules  (M) 
through  a  block  of  error-control  logic,  as  pictured  in  Figure  3.  Any  number  of  processors 
and  memories  may  be  specified  and  each  can  be  designated  as  initially  active  or  a  hot 
or  cold  standby  spare.  Homogeneous  groups  of  processors  or  memories  may  be  defined  in 
which  all  modules  operate  redundantly.  A  group  of  processors  execute  the  same  workload 
in  lock-step  synchronization;  memories  in  a  group  have  identical  contents  and  are  accessed 
simultaneously. 

The  error-control  logic  may  be  built  from  various  combinations  of  components  of¬ 
ten  found  in  fault-tolerant  designs  such  as  voters,  comparators,  switches  and  error  detect- 
ing/correcting  codes.  Custom  error-control  logic  circuitry  may  also  be  specified  by  the 
user.  This  flexibility  allows  the  system  model  to  represent  a  variety  of  multiprocessor  de¬ 
signs  utilizing  multiple  levels  of  passive,  active  or  hybrid  redundancy,  or  coding  to  achieve 
fault-Uderance. 

A  functional-level  abstraction  is  used  in  modeling  the  operation  of  both  processor  and 
memory  modules.  The  state  of  a  processor  is  defined  by  the  values  driven  on  its  data  and 
address  buses  and  is  determined  by  inputs  it  recdves  over  the  bus.  The  state  of  a  memory 
is  defined  by  the  contents  of  its  bit-array,  with  the  functionality  of  its  addressing-logic  being 
simulated  on  every  access.  In  order  to  reduce  some  of  the  unnecessary  complexity  in  the 
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system  model,  logic  v&lues  0  and  1  are  not  differentiated:  only  “error-free”  and  “erroneous” 
states  exist  for  each  bit.  Memory  depth  is  variable  and  word  width  for  memory  and  all  data 
paths  may  be  changed  with  minor  modifications  in  the  code. 

Workload  Model 

A  synthetic  workload  is  assumed  in  which  processors  continually  perform  computation  cycles 
consisting  of  an  instruction  fetch,  a  possible  operand  read,  a  computation  and  a  possible 
result  write.  Real  code  and  data  are  not  used  by  REACT,  but  errors  are  allowed  to  prop¬ 
agate  throughout  the  system  as  if  the  application  program  was  actually  being  executed. 
REACT  is  an  event-driven  simulator,  since  only  those  computation  cycles  in  which  errors 
both  propagate  and  change  the  erroneous  state  of  the  system  need  to  be  simulated. 

Because  several  years  of  system  operation  may  be  simulated  during  a  single  trial, 
average  workloaul  characteristics  are  utilized.  Behavior  of  the  application  workload  is  spec¬ 
ified  by  a  mean  instruction  execution  rate,  the  probabilities  of  performing  a  data  read  and 
write  per  instruction,  plus  a  locality-of-reference  model.  By  definition,  one  memory  read  to 
fetch  an  instruction  is  made  every  computation  cycle.  Values  for  the  mean  number  of  data 
accesses  made  during  the  execution  of  an  instruction  may  be  obtained  either  through  trace 
analysis  or  directly  from  the  measurement  of  operational  hardware.  It  is  assumed  that  all 
memory  references  access  one  whole  word. 

Which  memory  locations  are  accessed  during  a  computation  cycle  are  determined 
via  the  locality  of  reference  model.  The  testbed  implements  a  model  based  on  Bradford- 
Zipf  distributions,  which  have  proven  to  be  representative  of  memory  access  behavior  [1]. 
This  locality  model  suggests  that  a  x  100%  of  all  accesses  go  to  x  100%  of  the  memory 
under  the  condition  a  -I-  =  1.  Heising  first  reported  what  was  deemed  an  “80/20  Rule” 

when  parameter  values  a  =  0.8  and  =  0.2  were  observed  to  hold  for  many  commercial 
applications  [7].  Reference  addresses  are  assumed  to  be  uniformly  distributed  inside  and 
outside  of  the  locality;  no  attempt  is  made  to  separate  code  from  data  in  memory  with  the 
model. 

Fkiult  and  Error  Model 

The  fault  and  error  model  employed  by  REACT  accoimts  for  permanent,  intermittent  and 
transient  faults  in  the  processors  plus  permanent  and  transient  faults  in  the  memories  and 
the  error-control  logic.  Faults  with  inter-arrival  times  that  are  sampled  from  a  Weibull 
distribution  (of  which  the  exponential  distribution  is  a  subset)  are  injected  into  these  modules 
only  at  the  beginning  of  a  computation  cycle.  Repair  times  for  failed  modules  have  a  log¬ 
normal  distribution  after  a  fixed  logistics  delay. 
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The  exact  behavior  of  a  processor  in  the  presence  of  faults  can  only  be  determined 
with  a  complex  architectural  model  for  that  particular  processor.  In  order  to  preserve 
generality  of  the  testbed,  detailed  knowledge  of  the  processor  architecture  is  not  mandatory. 
Instead,  it  is  assumed  that  processor  fault  effects  are  completely  characterized  by  the  rate  at 
which  errors  appear  on  its  memory  bus.  Three  types  of  errors  exist:  transients  lasting  only 
one  computation  cycle,  intermittents  with  a  WeibuU  distributed  duration,  and  permanents 
which  have  an  effect  in  every  computation  cycle.  Errors  may  affect  addresses,  (write)  data, 
or  both  addresses  and  data  simultaneously.  An  erroneous  address  is  assumed  to  access 
a  random  memory  location  while  erroneous  data  take  on  a  random  value.  In  addition, 
erroneous  processor  reads  are  assumed  to  generate  output  errors  in  the  same  computation 
cycle.  Several  fault-injection  experiments  on  actual  processors  have  obtained  results  which 
support  this  functional  processor  abstraction  by  providing  measurements  for  its  parameters 
(3,  5,  9]. 

Memory  faults  are  divided  among  the  bit-array  and  addressing-logic  regions  of  a  mem¬ 
ory  module.  The  fraction  of  faults  which  fall  into  each  of  these  regions  may  be  approximated 
by  their  relative  chip  areas.  Bit-array  faults  are  assumed  to  affect  a  single  random  bit  in  a 
word  at  a  random  address  while  a  random  location  is  referenced  during  an  addressing-logic 
fault.  A  transient  bit-array  fault  may  b.^  overwritten  (changing  it  from  the  erroneous  to 
error-free  state)  at  any  time,  but  a  permanent  can  never  be  overwritten.  Addressing-logic 
transients  last  one  computation  cycle  and  permanents  will  cause  the  memory  module  to 
endlessly  access  random  words.  An  access  to  a  random  address  reads  or  writes  a  value  with 
randomly  corrupted  bits,  corresponding  to  the  difference  in  bit  values  between  the  word  that 
was  accessed  and  the  word  that  should  have  been  accessed.  Finally,  faults  within  one  of  the 
error-control  logic  components  are  assumed  to  affect  a  single  random  bit  either  permanently 
or  for  one  computation  cycle  (in  the  case  of  transients). 

Extensions  of  REACT  for  Multiprocessor  and  Distributed  Sys¬ 
tems 

REACT  is  a  very  useful  tool  for  evaluating  a  fault  tolerant  architecture  that  can  be  modeled 
using  Figure  3.  Such  an  architecture  is  suitable  to  implement  a  single  node  in  a  distributed 
or  multiprocessor  system.  The  architecture  used  for  a  node  determines  which  fault  model 
may  be  used  for  that  node  (i.e.  fail-stop,  fail-slow,  or  Byzantine).  Presently,  REACT  is 
useful  to  evaluate  a  single  such  node.  We  propose  to  extend  REACT  to  allow  evaluation 
of  a  multiprocessor  or  distributed  system  consisting  of  multiple  such  nodes.  We  propose 
that  the  extended  REACT  will  take  into  account  the  fault  tolerance  scheme  used  by  an 
application  executing  on  the  multiple  nodes.  Thus,  reliability  and  availability  will  not  only 
be  function  of  the  architecture  of  each  individual  node,  but  also  depend  on  the  fault  tolerance 
mechanism  used  by  the  application. 
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3  Synopsis  of  Future  Research 

The  thrust  of  our  future  research  is  on  issues  related  to  fault  tolerant  multiprocessor  and 
distributed  systems. 

3.1  Fault  Tolerance  Schemes  for  Multiprocessor  and  Distributed 
Systems 

Design  and  implementation  of  fault  tolerance  schemes  for  multiprocessor  and  distributed 
systems  is  a  thrust  area  of  this  proposal.  We  propose  to  investigate  a  number  of  fault 
tolerance  schemes  for  multiprocessor  and  distributed  systems  to  evaluate  the  performance, 
reliability  and  availability  trade-offs.  The  fault  tolerance  mechanism  used  in  such  systems 
must  be  chosen  based  on  a  number  of  criterion.  The  criterion  that  we  consider  important 
are  as  follows. 

•  Reliability  and  availability  requirements.  These  requirements  have  a  serious  impact  on 
the  level  of  redundancy  required.  These  requirements  may  be  specified  probabilistically 
(e.g.  availability  of  99.2%)  or  deterministically  (e.g.  tolerate  up  to  two  simultaneous 
failures). 

•  The  fault  model.  The  fault  models  that  are  applicable  to  most  real-life  systems  are: 
(i)  fail-stop  model,  (ii)  fail-slow  model,  (iii)  arbitrary  or  Byzantine  failure  model.  We 
propose  to  investigate  fault  tolerance  schemes  for  all  the  three  models. 

•  The  application.  Two  types  of  applications  are  of  particular  interest:  (i)  long-running 
^plications  which  are  expected  to  provide  results  at  the  end  of  computation  (e.g.  dis¬ 
tributed  simulations,  weather-forecasting,  etc.)  (ii)  applications  that  eure  long-running 
but  are  also  expected  to  provide  results  often  during  the  computation.  The  require¬ 
ments  of  these  two  application  areas  are  somewhat  different,  requiring  different  fault 
tolerance  techniques. 

A  goal  of  the  proposed  research  is  to  provide  fault  tolerance  approaches  to  match  var¬ 
ious  reliability  and  application  requirements,  and  experimentally  evaluate  the  performance 
of  the  proposed  fault  tolerance  mechanisms.  We  propose  to  develop  a  testbed  to  implement 
a  wide  range  of  fault  tolerance  schemes  for  multiprocessor  and  distributed  environments. 
A  goal  being  to  to  provide  a  common  basis  for  experimental  evaluation  and  comparison  of 
various  schemes. 
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3.2  Software-Implemented  Fault  Tolerance  for  Multiprocessors 


A  major  goal  of  the  proposed  research  is  to  develop  a  software  layer  for  the  purpose  of  provid¬ 
ing  dependable  operation  at  minimal  cost  on  numerous  existing  parallel  computing  systems 
(such  as  nCUBE  and  MASSPAR),  as  well  as  the  provision  of  a  flexible  framework  within 
which  to  explore  new  fault-tolerance  approaches.  We  propose  to  provide  user-transparent 
Software-Implemented  Fault  Tolerance  (SIFT)  for  hardware  failures.  We  propose  to  develop 
a  SIFT  layer  that  will  be  located  between  the  operating  system  and  the  user  application. 
The  following  is  a  collection  of  requirements  that  will  be  satisfied  by  the  SIFT  layer. 

(1)  The  SIFT  layer  should  reside  on  top  of  the  existing  system  software. 

(2)  It  should  allow  the  user  to  choose  the  fault-tolerant  approach  to  be  used. 

(3)  It  should  be  independent  of  the  interconnection  topology  of  the  target  parallel 
computer. 

(4)  It  should  be  application  independent. 

(5)  It  should  allow  existing  programs  to  be  run  without  modification. 

(6)  It  should  be  based  on  a  set  of  primary  functions  collectively  referred  to  as  the 
SIFT-kernel. 

(7)  It  should  be  amenable  to  formal  representation  for  the  purpose  of  formal  verifi¬ 
cation. 

(8)  It  should  permit  fault  injection  for  the  purpose  of  dependability  validation. 

Due  to  the  availability  of  the  nCUBE  machine  at  Texas  A&M  University,  we  are 
developing  the  SIFT-layer  on  nCUBE.  This  layer  will  also  be  able  to  be  ported  to  other 
multiprocessors. 


3.3  Analysis  Tool  for  Multiprocessor  and  Distributed  Systems 

The  different  fault  tolerance  mechanisms  propose!  to  be  developed  during  the  course  of  this 
research  need  to  be  evaluated  to  determine  the  level  of  reliability  and  availability  achieved 
using  those  mechanisms.  We  have  already  developed  a  tool,  REACT,  which  is  very  useful 
for  evaluating  most  fault  tolerant  architectures  used  to  implement  a  single  node  in  a  mul¬ 
tiprocessor  or  distributed  system.  The  architecture  used  for  a  node  determines  which  fault 
model  may  be  used  for  that  node  (i.e.  fail-stop,  fail-slow,  or  Byzantine).  Presently,  REACT 
is  useful  to  evaluate  a  single  such  node. 

We  are  currently  in  the  process  of  extending  REACT  to  allow  evaluation  of  a  mul¬ 
tiprocessor  or  distributed  system  consisting  of  multiple  such  nodes.  The  extended  REACT 
takes  into  accoimt  the  fault  tolerance  scheme  used  by  an  application  executing  on  the  mul¬ 
tiple  nodes.  Thus,  reliability  and  availability  will  not  only  be  function  of  the  architecture 
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of  each  individual  node,  but  also  depend  on  the  fault  tolerance  mechanism  used  by  the 
application. 

In  summary,  this  tool  is  useful  in  evaluating  the  reliability  achieved  by  the  fault 
tolerance  schemes  that  we  will  propose. 
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and  G.  D.  Holland),  accepted  in  1994  International  Conference  on  Parallel  Processing. 

3.  “Synthesis  of  Initializable  Asynchronous  Circuits",  (with  S  Chakradhar,  S.  Banerjee 
and  R.  Roy),  International  Conference  on  VLSI  Design,  Calcutta,  India,  December 
1993. 

4.  “Recovery  in  Distributed  Mobile  Environments”  (with  P.  Krishna  and  N.H.  Vaidya), 
IEEE  Workshop  on  Advances  in  Parallel  and  Distributed  Systems,  October  1993. 

5.  “A  Method  to  Derive  Comp2M:t  Test  Sets  for  Path  Delay  Faults  in  Combinational  Cir¬ 
cuits”  (with  J.  Saxena),  199S  International  Conference  on  Computer-Design,  Cam¬ 
bridge,  Massachusetts,  pp.  518-522,  October  4-6,  1993. 
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6.  ‘‘Design  for  Testability  of  Asynchronous  Sequential  Circuits”,  (with  J.  Saxena),  Inter¬ 
national  Test  Conference,  Baltimore,  October  17-21,  1993. 

7.  “Fast  and  Efficient  Strategies  for  Cubic  and  Non-Cubic  Allocation  in  Hypercube  Mul¬ 
tiprocessors”,  (with  D.  Das  Sharma),  199S  International  Conference  on  Parallel  Pro¬ 
cessing,  Chicago,  August  1993. 

8.  “A  Synthesis  and  Evaluation  Tool  for  Fault-Tolerant  Miiltiprocessor  Architectures” 
(with  J.  Clark),  Annual  Reliability  and  Maintainability  Symposium,  pp.  428-435,  Jan¬ 
uary  1993. 

9.  “Buffer  Assignment  for  Date  Driven  Architectures,”  (with  M.  Chatterjee),  Interna¬ 
tional  Conference  on  Computer  Aided  Design  *9S,  November  1993. 

10.  “A  Fast  and  Efficent  Strategy  for  Submesh  Allocation  in  Mesh- Connected  Parallel 
Computers,  (with  D.  Das  Sharma),  5th  IEEE  Symposium  on  Parallel  and  Distributed 
Processing,  December  1993. 

11.  “Optimal  Broadcasting  in  de  Bruijn  Networks  and  Hyper-de  Bruijn  Networks”  (with 
E.  Ganesan),  International  Parallel  Processing  Symposium,  April  1993. 

12.  “Degradable  Agreement  in  the  Presence  of  Byzantine  Faults”  (with  N.  Vaidya),  ISth 
International  Conference  on  Distributed  Computing  Systems,  Pittsburgh,  Pennsylva¬ 
nia,  May  1993. 
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N.S.  Bowen),  IEEE  Transactions  on  Computers,  to  appear. 
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A.  Mendelson),  IEEE  Transactions  on  Computers,  Vol.  42,  No.  1,  pp.  1-14,  January 
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Transactions  on  Computers,  Vol.  41,  pp.  516-525,  May  1992. 
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684-694,  May  1993. 
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18.  “Fault-Tolerant  Design  Strategies  for  High  Reliability  and  Safety”  (with  N.  Vaidya), 
IEEE  Transactions  on  Compute  rs,  Vol.  42,  No.  10,  October  1993. 

19.  “A  New  Class  of  Bit  and  Byte  Error  Control  Codes”  (with  N.  Vaidya),  IEEE  Trans¬ 
actions  on  Infomation  Theory,  Sept.  1992 

20.  “Yield  Optimization  in  Large  RAMs  with  Hierarchical  Redundancy”  (with  K.N.  Gana- 
pathy  and  A.D.  Singh),  IEEE  Journal  of  Solid  State,  Vol.  26,  No.  9,  pp.  1259-1264, 
September  1991. 

21.  “A  New  Framework  for  Designing  and  Analyzing  BIST  Techniques  and  Zero  Aliasing 
Compression”  (with  S.K.  Gupta),  IEEE  Transactions  on  Computers,  Vol.  40,  No.  6, 
pp.  743-763,  June  1991. 

22.  “Consensus  with  Dual  Mode  Failures”  (with  F.J.  Meyer),  IEEE  Transactions  on  Par¬ 
allel  and  Distributed  Systems,  Vol.  2,  No.  2,  pp.  214-222,  April  1991. 

23.  “Error  Correcting  Codes  in  Fault-Tolerant  Computers”  (with  E.  Fujiwara),  Computer, 
Vol.  23,  No.  7,  pp.  63-72,  July  1990. 
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Technique”  (with  S.  Gupta  and  M.  Karpovsky),  IEEE  Transactions  on  Computers, 
Vol.  39,  pp.  586-591,  April  1990. 

25.  “Organization  and  Analysis  of  GracefuUy-Degrading  Inter-leaved  Memory  Systems” 
(with  K.  Saluja,  G.  Sohi  and  K.  Cheung),  IEEE  Transactions  on  Computers,  Vol.  39, 
No.  1,  pp.  63-71,  January  1990. 

26.  “Modeling  Defect  Spatial  Distribution”  (with  F.J.  Meyer),  IEEE  Transactions  on 
Computers,  Vol.  38,  No.  4,  pp.  538-546,  April  1989. 

27.  “The  DeBruijn  Multiprocessor  Networks:  A  Versatile  Parallel  Processing  Network 
for  VLSI”  (with  M,  Samatham),  IEEE  Transactions  on  Computers,  Vol.  38,  No.  4, 
pp.  567-581,  April  1989. 

28.  “Dynamic  Testing  Strategy  for  Distributed  System”  (with  F.J.  Meyer),  IEEE  Trans¬ 
actions  on  Computers,  Vol.  38,  No.  3,  pp.  356-365,  March  1989. 

29.  “TRAM:  A  Design  Methodology  for  High  Performamce  Testable  Large  RAMs”  (with 
N.  Jarwala),  IEEE  Transactions  on  Computers,  Vol.  C-37,  No.  10,  pp.  1235-1250, 
October  1988. 

30.  “Designing  Interconnection  Buses  in  VLSI  and  WSI  for  Maximum  Yield  and  Minimum 
Delay”  (with  I.  Koren  and  Z.  Koren),  IEEE  Journal  of  Solid  State  Circuits,  Vol.  23, 
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32.  “Modeling  the  Effect  of  Redundancy  on  Yield  and  Performance  of  VLSI  Systems” 
(with  I.  Koren),  IEEE  Transaction  on  Computers,  Vol.  C-36,  No.  3,  pp.  344-355, 
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33.  “Yield  and  Performance  Enhancement  through  Redundancy  in  VLSI  and  WSI  Mul¬ 
tiprocessor  Systems”  (with  I.  Koren),  IEEE  Proceedings,  Vol.  74,  No.  5,  pp.  699-711, 
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34.  “Dynamically  Restructurable  Fault-tolerant  Processor  Network  Architectures”,  IEEE 
Transactions  on  Computers,  Vol.  C-34,  No.  5,  pp.  434-447,  May  1985. 

35.  “Fault-tolerant  Multiprocessor  Structures”,  IEEE  Transactions  on  Computers,  Vol.  C- 
34,  No.  1,  pp.  33-45,  January  1985. 

36.  “Synthesis  of  Directed  Multi-Commodity  Flow  Problems”  (with  A.  Itai),  Networks, 
Vol.  14,  pp.  213-224,  1984. 
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37.  '^Sequential  Network  Design  Using  Extra  Inputs  for  Fault  Detection”,  IEEE  Transac¬ 
tions  on  Computers,  Vol.  C-32,  No.  3,  pp.  319-323,  March  1983. 

38.  "A  Fault-Tolerant  Distributed  Processor  Communication  Architecture”  (with  S. 
Reddy),  IEEE  Transactions  on  Computers,  Vol.  C-31,  No.  9,  pp.  863-870,  September 
1982. 

39.  “A  Class  of  Unidirectional  Error  Correcting  Codes”,  IEEE  Transactions  on  Computers, 
Special  Issue  on  Fault- Tolerant  Computing,  Vol.  C-32,  No.  6,  pp.  564-568,  June  1982. 

40.  "A  Uniform  Representation  of  Permutation  Networks  Used  in  Memory-Processor  In¬ 
terconnection”  (with  K.L.  Kodandapani),  IEEE  Transactions  on  Computers,  Special 
Issue  on  Parallel  Processing,  Vol.  C-29,  No.  9,  pp.  777-791,  September  1980. 

41.  “A  New  Class  of  Error  Correcting-Detecting  Codes  for  Fault-Tolerant  Computer  Ap¬ 
plications”,  IEEE  Transactions  on  Computers,  Special  Issue  on  Fault-Tolerant  Com¬ 
puting,  Vol.  C-29,  No.  6,  pp.  471-481,  June  1980. 

42.  “Elrror-Correcting  Codes  and  Self-Checking  Circuits”  (with  J.J.  Stiffler),  IEEE  Com¬ 
puter,  Special  Issue  on  Fault- Tolerant  Computing,  Vol.  13,  No.  3,  pp.  27-38,  March 
1980. 

43.  "Undetectability  of  Bridging  Faults  and  Validity  of  Stuck-at  Fault  Test  Sets”  (with 
K.L.  Kodandapani),  IEEE  Transactions  on  Computers,  Vol.  C-29,  No.  1,  p.  55-59, 
January  1980. 

44.  “Fault-Tolerant  Asynchronous  Networks  Using  Read-Only  Memories”,  IEEE  Transac¬ 
tions  on  Computers,  Vol.  C-27,  No.  7,  pp.  674-679,  July  1978. 

45.  “Fault  Secure  Asynchronous  Networks”,  IEEE  Transactions  on  Computers,  Vol.  C-27, 
No.  5,  pp.  396-404,  May  1978. 

46.  “A  Theory  of  Galois  Switching  Functions”,  IEEE  Transactions  on  Computers,  Vol.  C- 
27,  No.  3,  pp.  239-249,  March  1978. 

47.  “Universal  Test  Sets  for  Multiple  Fault  Detection  in  AND-EXOR  Arrays”,  IEEE 
Transaction  on  Computers,  Vol.  C-27,  No.  2,  pp.  181-187,  February  1978. 

48.  “Store  Address  Generator  with  Built-In  Fault  Detection  Capabilities”  (with  M.Y. 
Hsiao  k,  A.M.  Patel),  IEEE  Transactions  on  Computers,  Vol.  C-26,  No.  11,  pp.  1144- 
1147,  November  1977. 

49.  “A  Graph-Structural  Approach  for  the  Generalization  of  Data  Management  Systems”, 
Information  Sciences,  American  Elesevier  Publishing  Company,  Inc.,  pp.  1-17,  March 
1977. 
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50.  “Techniques  to  Construct  (2,1)  Separating  Systems  from  Linear  Codes”  (with  S.M. 
Reddy),  IEEE  Transactions  on  Computers,  Vol.  C-25,  No.  9,  pp.  945-949,  September 
1976. 

51.  “Reed-Muller  Canonic  Forms  for  Multivalued  Functions”  (with  A.M.  Patel),  IEEE 
Transactions  on  Computers,  Vol.  C-24,  No.  2,  pp.  206-220,  February  1975. 

52.  “Fault-Tolerant  Carry  Save  Adders”,  IEEE  Transactions  on  Computers,  Vol.  C-23, 
No.  11,  pp.  1320-1322,  November  1974. 

53.  “Design  of  Two-Level  Fault-Tolerant  Networks”  (with  S.M.  Reddy),  IEEE  Transac¬ 
tions  on  Computers,  Vol.  C-23,  No.  1,  pp.  41-48,  June  1974. 

54.  “Fault-Tolerant  Asynchronous  Networks”  (with  S.M.  Reddy),  IEEE  Transactions  on 
Computers,  Vol.  C-22,  No.  7,  pp.  662-669,  July  1973. 

55.  “Error  Correcting  Techniques  for  Logic  Processors”  (with  S.M.  Reddy),  IEEE  Trans¬ 
actions  on  Computers,  Vol.  C-21,  No.  12,  pp.  1331-1335,  December  1972. 

In  Conference  Proceedings 

1.  “Synthesis  of  Initializable  Asynchronous  Circuits",  (with  S  Chakradhar,  S.  Banerjee 
and  R.  Roy),  International  Conference  on  VLSI  Design,  Calcutta,  India,  December 
1993. 

2.  “Recovery  in  Distributed  Mobile  Environments”  (with  P.  Krishna  and  N.H.  Vaidya), 
IEEE  Workshop  on  Advances  in  Parallel  and  Distributed  Systems,  October  1993. 

3.  “A  Method  to  Derive  Compact  Test  Sets  for  Path  Delay  Faults  in  Combinational  Cir¬ 
cuits”  (with  J,  Saxena),  1993  International  Conference  on  Computer-Design,  Cam¬ 
bridge,  Massachusetts,  pp.  518-522,  October  4-6,  1993. 

4.  “Design  for  Testability  of  Asynchronous  Sequential  Circuits”,  (with  J.  Saxena),  Inter¬ 
national  Test  Conference,  Baltimore,  October  17-21,  1993. 

5.  “Fast  and  EfG  lent  Strategies  for  Cubic  and  Non-Cubic  Allocation  in  Hypercube  Mul¬ 
tiprocessors”,  (with  D.  Das  Sharma),  1993  International  Conference  on  Parallel  Pro¬ 
cessing,  Chicago,  August  1993. 

6.  “A  Synthesis  and  Evaluation  Tool  for  Fault-Tolerant  Multiprocessor  Architectures” 
(with  J.  Clark),  Annual  Reliability  and  Maintainability  Symposium,  pp.  428-435,  Jan¬ 
uary  1993. 

7.  “Buffer  Assignment  for  Date  Driven  Architectures,”  (with  M.  Chatterjee),  Interna¬ 
tional  Conference  on  Computer  Aided  Design  ’93,  November  1993. 
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8.  “A  Fast  and  Efficent  Strategy  for  Submesh  Allocation  in  Mesh- Connected  Parallel 
Computers,  (with  D.  Das  Sharma),  5tk  IEEE  Symposium  on  Parallel  and  Distributed 
Processing,  December  1993. 

9.  “Optimal  Broadcasting  in  de  Bruijn  Networks  and  Hyper-de  Bruijn  Networks”  (with 
E.  Ganesan),  International  Parallel  Processing  Symposium,  April  1993. 

10.  “Degradable  Agreement  in  the  Presence  of  Byzantine  Faults”  (with  N.  Vaidya),  ISth 
International  Conference  on  Distributed  Computing  Systems,  Pittsburgh,  Pennsylva¬ 
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11.  “A  Novel  Approach  for  Subcube  Allocation  in  Hypercube  Multiprocessor”,  (with  D. 
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13.  “Roll-Forward  Checkpointing  Scheme:  Concurrent  Retry  with  Nondedicated  Spares” 
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